302 research outputs found

    Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

    Get PDF
    Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

    The CHAIN-REDS Semantic Search Engine

    Get PDF
    e-Infrastructures, and in particular Data Repositories and Open Access Data Infrastructures, are essential platforms for e-Science and e-Research and are being built since several years both in Europe and the rest of the world to support diverse multi/inter-disciplinary Virtual Research Communities. So far, however, it is difficult for scientists to correlate papers to datasets used to produce them and to discover data and documents in an easy way. In this paper, the CHAINREDS project’s Knowledge Base and its Semantic Search Engine are presented, which attempt to address those drawbacks and contribute to the reproducibility of science

    A Review of Lightweight Thread Approaches for High Performance Computing

    Get PDF
    High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft

    Estudio de iones de Zr, Cd y Ag mediante espectrometría de ruptura inducida por láser

    Get PDF
    En este trabajo se ha hecho un estudio de los iones ZrIII, CdII y AgII mediante la técnica de la espectrometría de ruptura inducida por láser (LIBS). Para ello, se ha llevado a cabo la puesta a punto de un sistema de adquisición de espectros de emisión de plasmas producidos por láser y se ha obtenido la respuesta espectral del citado sistema en el rango de los 1900 a los 7000 Å. Se ha realizado un estudio espectroscópico de los diferentes plasmas empleados. De esta forma se han determinado parámetros de ellos tales como su composición, su temperatura o la densidad de electrones y la autoabsorción que presentaban. También, y gracias a los parámetros mencionados, se ha determinado si los plasmas estaban en Equilibrio Termodinámico Local y si eran ópticamente delgados.Se han medido experimentalmente las probabilidades de transición de las transiciones que parten de los niveles 4d5d y 4d5p del ZrIII, de los niveles 5p, 5d, 6s, 4d95s2, 6p, 4d95s5p, 4f, 7p, 5f y 8p del CdII y de los niveles 5s2 y 6s 3D3 de la AgII. Estos experimentos se han realizado con oxido de zirconio y una aleación de Zr-Cu para el caso del ZrIII, con cadmio puro y una aleación de Cd-Zn para el caso del CdII y con plata pura para el caso de la AgII. También se han calculado teóricamente mediante el método de Hartree-Fock relativista con mezcla de configuraciones las probabilidades de transición y las vidas medias de los niveles anteriormente mencionados del ZrIII y CdII

    Montera: A Framework for Efficient Execution of Monte Carlo Codes on Grid Infrastructures

    Get PDF
    he objective of this work is to improve the performance of Monte Carlo codes on Grid production infrastructures. To do so, the codes and the grid sites are characterized with simple parameters to model their behaviors. Then, a new performance model for grid infrastructures is proposed, and an algorithm that employs this information is described. This algorithm dynamically calculates the number and size of tasks to execute on each site to maximize the performance and reduce makespan. Finally, a newly developed framework called Montera is presented. Montera deals with the execution of Monte Carlo codes in an unattended way, isolating the complexity of the problem from the final user. By employing two fusion Monte Carlo codes as example cases, along with the described characterizations and scheduling algorithm, a performance improvement up to 650 % over current best results is obtained on a real production infrastructure, together with enhanced stability and robustness
    corecore